successor feature
- Europe > Austria (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
- Research Report > Experimental Study (1.00)
- Workflow (0.67)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Education > Educational Setting (0.45)
- Health & Medicine > Therapeutic Area > Neurology (0.45)
Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration
The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solution to a given problem, making them brittle to problem variations. Replicating human flexibility in reinforcement learning agents is the challenge that we explore in this work.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
- (4 more...)
- North America > United States > California (0.04)
- Europe > Italy > Sardinia (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (6 more...)
Distributional Successor Features Enable Zero-Shot Policy Optimization
Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, policy optimization with successor features can be challenging. This work proposes a novel class of models, i.e., Distributional Successor Features for Zero-Shot Policy Optimization (DiSPOs), that learn a distribution of successor features of a stationary dataset's behavior policy, along with a policy that acts to realize different successor features within the dataset. By directly modeling long-term outcomes in the dataset, DiSPOs avoid compounding error while enabling a simple scheme for zero-shot policy optimization across reward functions. We present a practical instantiation of DiSPOs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems.